1 



m 
o 
o 

(N 
> 

o 
o 

(N 

(N 
> 

(N 
O 



o 



X 



www . f ermiqcd . net 

Massimo Di Pierro'^*, Aida X. El-Khadra*', Steven Gottlieb'' Andreas S. Kronfeld'^, Paul B. Mackenzie'^, 
Masataka Okamoto'^, Mehmet B. Oktay'' and James N. Simone'^ (for the FermiQCD Collaboration) 

^ School of Computer Science, Telecommunications and Information Systems, DePaul University, 
Chicago, IL 60604 

^ Department of Physics, University of Illinois, Urbana, IL 61801 
Department of Physics, Indiana University, Bloomington, IN 47405 
Fermi National Accelerator Laboratory, P.O. Box 500, Batavia, IL 60510 



1. INTRODUCTION 

FermiQCD [1] [2] is a C++ library for fast de- 
velopment of parallel lattice QCD applications ^. 
The expression FermiQCD Collaboration is used 
as a collective name to indicate both the users of 
the software and its contributors. 

One of the main differences between FermiQCD 
and libraries developed by other collaborations is 
that it follows an object oriented design as op- 
posed to a procedural design. FermiQCD should 
not be identified exclusively with the implemen- 
tation of the algorithms but, rather, with the 
strict specifications that define its Application 
Program Interface. One should think of Fer- 
miQCD as a language on its own (a superset of 
the C++ language), designed to describe Lattice 
QCD algorithms. The objects of the language 
include complex numbers (mdp_complex) , matri- 
ces (mdp_matrix), lattices (mdpJattice), fields 
(gaugeJield, fermiJield, staggeredJield) , prop- 
agators (fermi_propagator) and actions. Algo- 
rithms written in terms of these objects are au- 
tomatically parallel. 

Some of the advantages of our design approach 
are the following: 

• Programs written in FermiQCD are easy to 
write, read and modify since the FermiQCD 
syntax resembles the mathematical syntax 
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used in Quantum Field Theory articles and 
books. 

• Programs are portable in the sense that 
they can, in principle, be compiled with any 
ANSI C++ compiler. Hardware specific op- 
timizations are coded in the library and hid- 
den from the high level programmer. 

• The high level programmer does not have 
to deal with parallelization issues since the 
underlying objects deal with it. FermiQCD 
communications are based on MPl. 

• Programs are easier to debug because the 
usage of FermiQCD objects and algorithms 
does not require explicit use of pointers. All 
memory management is done by the objects 
themselves. 

FermiQCD was originally designed with one 
main goal in mind: easy of use. It is true 
that in many cases this requires a compromise 
with speed. For example, FermiQCD actions 
and fields support arbitrary SU{nc) gauge group 
and it is not practical to optimize the linear al- 
gebra for any Uc- However it is convenient to 
optimize some of the specific cases of interest 
(such as ric = 3) while maintaining compatibil- 
ity with the FermiQCD syntax. At present the 
FermiQCD library includes multiple implemen- 
tations of each action (Wilson, Clover, Kogut- 
Susskind, Asqtad[4], Domain Wall, Fermilab[5]) 
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and, for each action, at least one implementa- 
tion is optimized for SU{3). In particular, we 
provide a FermiQCD port of the Clover 5/7(3) 
action coded by M. Liischer using SSE assembly 
instructions for Pentium 4 [6]. 

The following code declares two fermionic fields 
(phi and psi) and one SU{iic) gauge field (U) on 
a given lattice (lattice), reads psi and U from 
disk and computes 

ip = Q[U,K]-'^ 

where Q[U, k] = D[U]+ to[k] is the Dirac matrix 
and K is the typical lattice parameter that sets 
the mass scale: 

ferini_field phi(lattice, nc) ; 
fermi_field psi(lattice, nc) ; 
gauge_field U(lattice ,nc) ; 
coefficients light_ quark; 
psi.load("ferini_field.indp") ; 
U.load("gauge_f ield.mdp") ; 
light_quark ["kappa"] =0.123; 
if (nc==3) def ault_f ermi_action= 

FermiCloverActionSSE2 : :inul_C!; 
mul_invQ (phi , psi ,U, light _quark) ; 

Note that when nc = 3 the above program uses 
the SSE optimized Wilson action. If nc 7^ 3 a 
different implementation of the Wilson action is 
used. FermiQCD programs are transparent to the 
choice of the action. 

On a cluster of 2GHz Pentium 4 PCs connected 
by Myrinct, one double precision SU(3) minimum 
residue step with Clover action takes 4.8 micro 
seconds per lattice site. The efficiency drops to 
75 — 80% on 8 processors. For more benchmarks 
we refer to the www.fermiqcd.net web page. 

FermiQCD programs are implicitly parallel. 
Each lattice^ object determines an optimal com- 
munication pattern for its sites assuming a 
next-neighbor, a next-next-neighbor or a next^- 
neighbor interaction in the action. The optimal 
patterns are determined according to empirical 
rules that minimize dependence on the network 
latency and minimize communication load. 



2. EXAMPLES 

2.1. Computing the average plaquette 

Given a /Ui^-plane, the average plaquette, for 
each gauge configuration, is defined as 

(P^,) = ^ReTr U^{x)UAx + n) (1) 
\UAx)U,{x + 1))f (2) 
here is the corresponding FermiQCD syntax: 

f orallsites (x) 

p=p+real (trace (U (x , mu) *U (x+mu , nu) * 

hermit ian(U(x,nu)*U(x+nu,mu) ) ) ) ; 
p=p/lattice . size () ; 

In the above code f orallsites is a parallel loop, 
U(x,mu) is Tie X He color matrix. 

2.2. Implementing the Dirac-Wilson ac- 
tion 

As another example, we consider here the fol- 
lowing Wilson discretization of the Dirac action: 

( 1 - 7M)a/3 < {X - P) ^^P {■!■ - M) (4) 

which can be translated into the following Fer- 
miQCD code: 

phi=psi; 

f orallsites (x) 

for(int mu=0; mu<4; mu++) { 
for(int b=0; b<4; b++) { 

psi_up(b)=U(x,mu)*psi(x+mu,b) ; 
psi_dw(b)=hermitian(U(x-mu,mu) ) * 
psi(x-mu,b) ; 

> 

phi =phi -kappa* ( ( 1+Gamma [mu] ) *psi_up+ 
(1-Gamma[mu] )*psi_dw) ; 

} 

Note how "1" in this context is interpreted as a 
diagonal unitary matrix. The sum on a is implicit 
since psi_up and psi_dw are spinxcolor matrices. 

2.3. Computing the pion propagator 

The pion propagator is defined as 

Xl,X2,X3 

= ReTr ^ ^ 5„^(x)5j^Jx) (6) 

xi,X2,X3 a,f3 
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where 5 is a hght quark propagator in the back- 
ground gauge field U: 

SU^) = J [«]g(x)g(0)e^=*-"t«'^l (7) 

Here is the FermiQCD syntax for Eq. (6): 

generate (S ,U, light_quark) ; 
f orallsites (x) 

for(int a=0; a<4; a++) 
for(int b=0; b<4; b++) 

C2 (x (0) ) +=real (trace (S (x , a , b) * 

herinitian(S(x,b,a)))) ; 

where S(x,a,b) is a color x color matrix. The 
function generate creates the propagator S by 
calling the inverter (for example the minimum 
residue or the stabilized bi-conjugate gradient) . 
Each of the inverters works on any of the provided 
actions and also on user defined actions. 

3. CONCLUSIONS 

In our experience the cost of developing and 
debugging software is one of the major costs of 
Lattice QCD computations. FermiQCD was de- 
signed to standardize this software development 
process and, therefore, reduce the cost for the 
commimity. 

FermiQCD comes in the form of a library (with 
examples), rather than a collection of ready made 
programs, since we understand that different re- 
search groups have different requirements. Some 
ready made programs for specific computations 
(such as generating gauge configurations, com- 
puting meson propagators, etc.) are available as 
examples, while others arc available upon request 
to the authors. We provide converters for the 
most common data formats including UKQCD, 
MILC, CANOPY and various binary formats. 

The FermiQCD libraries can be used freely for 
research purposes and we do not require that 
users share their programs unless they wish to 
do so (although we feel that this practice should 
be encouraged). 

On one hand. FermiQCD is a mature project. 
The present implementation has been tested in- 
dependently by different groups and it has been 
used to develop large scale lattice QCD compu- 



tations. On the other hand, there is a lot of work 
that needs to be done, including: 

• implementing dynamical fermions 

• developing a graphical interface 

• optimizing the gauge actions 

• porting the low level communications li- 
braries (currently based on MPI) to other 
protocols such as TCP/IP for Gigabit eth- 
ernet and SciDAC API for the QCDOC. 

• incorporating grid-like features, including 

the ability to read and write data in the new 
international data grid standard for gauge 
configurations. 

• implementing new actions such as the 
Iwasaki gauge action and Overlap fermions. 

• building a collection of ready-to-use pro- 
grams and related docmnentation. 

The development of FermiQCD has greatly 
benefited from access to other codes such as 
UKQCD (from which we borrowed the local ran- 
dom number generator), MILC (for the staggered 
fermion algorithms), CANOPY (for many design 
features), M. Liischer's (for the assembly macros) 
and C. Michael's (for all-to-all propagators); we 
thank their authors for making them available to 
us. We also wish to thank J. Flynn, A. Shams, 
F. Mescia, L. Del Debbio and T. Rador for their 
through tests of the inverters and the Clover ac- 
tion. 
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